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(57) ABSTRACT 

A method and apparatus for selecting hot traces for trans- 
lation and/or optimization is described in the context of a 
caching dynamic translator. The code cache stores hot 
traces. Profiling is done at locations that satisfy a start-of- 
trace condition, e.g., the targets of backward taken branches. 
A hot target of a backward taken branch is speculatively 
identified as the beginning of a hot trace, without the need 
to profile the blocks that make up the trace. The extent of the 
speculatively selected hot trace is determined by an end-of- 
trace condition, such as a backward taken branch or a 
number of interpreted or native instructions. The interpreter 
is augmented with a mode in which it emits native instruc- 
tions that are cached. A trace is cached by identifying a hot 
start of a trace and then continuing interpretation while 
storing the emitted native instruction stream until an end- 
of-trace condition is met. 
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LOW OVERHEAD SPECULATIVE 
SELECTION OF HOT TRACES IN A 
CACHING DYNAMIC TRANSLATOR 

HELD OF THE INVENTIGN 

The present invention relates to techniques for identifying 
portions of computer programs that are particularly fre- 
quently executed. The present invention is particularly use- 
ful in dynamic translators needing to identify candidate 
portions of code for caching and/or optimization. 

BACKGROUND 

Dynamic translators translate one sequence of instruc- 
tions into another sequence of instructions which is 
executed. The second sequence of instructions are * native* 
instructions — they can be executed directly by the machine 
on which the translator is running (this 'machine' may be 
hardware or this machine may be defined by software that is 
running on yet another machine with its own architecture). 
Adynamic translator can be designed to execute instructions 
for one machine architecture (i.e., one instruction set) on a 
machine of a different architecture (i.e., with a different 
instruction set). Alternatively, a dynamic translator can take 
instructions that are native to the machine on which the 
dynamic translator is running and operate on that instruction 
stream to produce an optimized instruction stream. Also, a 
dynamic translator can include both of these functions 
(translation from one architecture to another, and 
optimization). 

Caching dynamic translators attempt to identify program 
hot spots (frequently executed portions of the program, such 
as certain loops) at runtime and use a code cache to store 
translations of those frequently executed portions. Subse- 
quent execution of those portions can use the cached 
translations, thereby reducing the overhead of executing 
those portions of the program. 

Adynamic translator may take instructions in one instruc- 
tion set and produce instructions in a different instruction 
set. Or, a dynamic translator may perform optimization: 
producing instructions in the same instruction set as the 
original instruction stream; thus, dynamic optimization is a 
special native-to -native case of dynamic translation. Or, a 
dynamic translator may do both — converting between 
instruction sets as well as performing optimization. 

In general, the more sophisticated the execution profiling 
scheme, the more precise the hot spot identification can be, 
and hence (i) the smaller the translated code cache space 
required to hold the more compact set of identified hot spots 
of the working set of the running program, and (ii) the less 
time spent translating hot spots into native code (or into 
optimized native code). Unless special hardware support for 
profihng is provided, it is generally the case that a more 
complex profiling scheme will incur a greater overhead. 
Thus, dynamic translators typically have to strike a balance 
between minimizing overhead on the one hand and selecting 
hot spots very carefully on the other. 

Depending on the profiling technique used, the granular- 
ity of the selected hot spots can vary. For example, a 
fine-grained technique may identify single blocks (a 
straight-Unc sequence of code without any intervening 
branches), whereas a more coarse approach to profiling may 
identify entire procedures. Since there are typically many 
more blocks that are executed compared to procedures, the 
latter requires much less profiling overhead (both memory 
space for the execution frequency counters and the time 



70,492 B2 

2 

Spent updating those counters) than the former. In systems 
that are doing program optimization, another factor to con- 
sider is the likelihood of useful optimization and/or the 
degree of optimization opportunity that is available in the 

S selected hot spot. A block presents a much smaller optimi- 
zaUon scope than a procedure (and thus fewer types of 
optimization techniques can be applied), although a block is 
easier to optimize because it lacks any control flow 
(branches and joins). 

Traces offer yet a different set of tradeoffs. Traces (also 
known as paths) are single-entry multi-exit dynamic 
sequences of blocks. Although traces often have an optimi- 
zation scope between that for blocks and that for procedures, 
traces may pass through several procedure bodies, and may 
even contain entire procedure bodies. Traces offer a fairly 
large optimization scope while still having simple control 
flow, which makes optimizing them much easier than a 
procedure. Simple control flow also allows a fast optimizer 
implementation, A dynamic trace can even go past several 
procedure calls and returns, including dynamically linked 

20 libraries (DLLs). This allows an optimizer to perform 
inlining, which is an optimization that removes redundant 
call and return branches, which can improve performance 
substantially. 

Unfortunately, without hardware support, the overhead 

25 required to profile hot traces using existing methods (such as 
described by T. Ball and J. Lams in "Efficient Path 
Profiling", Proceedings of the 29th Symposium on Micro 
Architecture (MICRO-29), December 1996) is often pro- 
hibitively high. Such methods require instrumenting the 

30 program binary (inserting instmctions to support profiling), 
which makes the profiling non-transparent and can result in 
binary code bloat. Also, execution of the inserted instru- 
mentation instructions slows down overall program execu- 
tion and once the instrumentation has been inserted, it is 

35 difficult to remove at runtime. In addition, such method 
requires sufliciently complex analysis of the counter values 
to uncover the hot paths in the program that such method is 
difficult to use effectively on-thc-fly while the program is 
executing. All of these make traditional schemes inefficient 

40 for use in a caching dynamic translator. 

Hot traces can also be constructed indirectly, using branch 
or basic block profiling (as contrasted with trace profiling, 
where the profile directly provides trace information). In this 
scheme, a counter is associated with the taken target of every 

45 branch (there are other variations on this, but the overheads 
are similar). When the caching dynamic translator is inter- 
preting the program code, it increments such a coimtcr each 
time a taken branch is interpreted. When a counter exceeds 
a preset threshold, its corresponding block is flagged as hot. 

50 Tbese hot blocks can be strung together to create a hot trace. 
Such a profiling technique has the following shortcomings: 

1. A large counter table is required, since the number of 
distinct blocks executed by a program can be very 
large. 

2. The overhead for trace selection is high. The reason can 
be intuitively explained: if a trace consists of N blocks, 
this scheme will have to wait until N counters all 
exceed their thresholds before they can be strung into 
a trace. It does not take advantage of the fact that after 
the first counter gets hot, the next N-1 counters are very 
likely to get hot in quick succession, making it unnec- 
essary to bother incrementing them and doing the 
bookkeeping of the past blocks that have just executed. 

65 SUMMARY OF THE INVENTION 

According to the present invention, traces are identified as 
hot on a speculative basis, rather than based on full trace 
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profile data. The series of blocks beginning at a hot start- DETAILED DESCRIPTION OF AN 

of-trace condition and continuing until an end-of-trace con- [LLUSTRATIVE EMBODIMENT 

dition is identified as a hot trace. Such a trace is identified Referring to FIG. 1, a dynamic translator includes an 

as hot without the need to incur the overhead of actuaUy interpreter UO that receives an input instrucUon stream 160. 
me^unng whether successive blocks have been executed a S ^h^^ "interpreter" represents the instrucdon evaluation 

sufBcieat number of trnies to be considered hot. i^ implemented in a number of ways (e.g, as 

The identification of what constitutes the trace is accom- a software fetch— <le code— eval loop, a just-in-time 

plished as the trace is executed. A translation of the trace is compiler, or even a hardware CPU), 

emitted as the trace is being executed, is available for implementaUon, the instructioDS of the input 

optimization in a system that performs optimization, and is lo ^^^^^^^ ^^^^^ ^ ^^^^ instruction set as that 

captured in the code cache. ^^^^^^ ^^^^ translator is running (nativc- 

A particularly useful start-of-trace condiUon is when the to-native iranslaUon). In the native-lo-native case, the pri- 

last interpreted branch was backward taken. A useful end- ^ary advantage obtained by the translator flows from the 

of-trace condition is when one of the following three con- dynamic optimization 150 that the translator can perform. In 

ditions occurs: (1) the last interpreted branch was backward ^^^j^^^ implementation, the input instructions are in a 

taken, (2) the number of interpreted branches exceeded a different instruction set than the native instructions, 

threshold value, or (3) the number of native instructions ^he trace selector 120 identifies instruction traces to be 

emitted for the trace exceeded another threshold value. ^j^red in the code cache 130. The trace selector is the 

Thus, according to the present invention, rather than use component responsible for associating counters with inter- 
higher overhead, sophisticated profiUng techniques for iden- prcted program addresses, determining when to toggle the 
tifying program hot spots at runtime, profiling need only be interpreter state (between normal and trace growing mode), 
done at certain well-defined program addresses, such as the and determining when a "hot trace" has been detected, 
targets of backward taken branches. When such an address ^uch of the work of the dynamic translator occurs in an 
gets hot (i.e., its associated counter exceeds a threshold), the interpreter— trace selector loop. After the inleipreter 110 
very next sequence of executed blocks (or trace) is specu- interprets a block of instructions (i.e., until a branch), control 
latively chosen as a hot path. is passed to the trace selector 120 to make the observations 

This scheme speculatively selects as a hot trace the very of the program's behavior so that it can select traces for 
next sequence of interpreted blocks following certain hot special processing and placement in the cache. The 
branch targets— in particular, certain branch targets that are 3^ interpreter— trace selector loop is executed until one of the 
likely to be loop headers. Even though this scheme does not following conditions is met: (a) a cache hit occurs, in which 
involve elaborate profiling, the quality of the traces selected case control jumps into the code cache, or (b) a hot start- 
by this technique can be excellent. One can understand why of-trace is reached. 

this scheme is effective as follows: sequences of hot blocks when a hot starl-of-trace is found, the trace selector 120 
are very often correlated; entire paths tend to get hot in a 3^ toggles the state of the interpreter 110 so that the interpreter 

running program, rather than a disconnected set of blocks. emits the trace instructions until the corresponding end-of- 

The present invention provides a mechanism for trade trace condition (condition (b)) is met. At this point the trace 

selection with reduced profiling overhead. selector invokes the trace optimizer 150. The trace optimizer 

Another advantage of the present invention ts that a trace is responsible for optimizing the trace instructions for better 
can be constructed even when it contains indirect branches 40 performance on the underlying processor. After optimization 

(branches whose outcomes are known only when the branch is done, the code generator 140 actually emits the trace code 

is executed, and which cannot be determined by simply into the code cache 130 and returns to the trace selector 120 

decoding the branch instruction). In contrast, it is awkward to resume the interpreter — trace selector loop, 

for trace growing schemes that rely on branch prediction FIG, 2 illustrates operation of an implementation of a 
information to deal with indirect branches, because there is 45 dynamic translator employing the present invention. The 

no simple way to predict the outcome of such branches. solid arrows represent flow of control, while the dashed 

A further advantage of the invention is that the memory arrow represents generation of data. In this case, the gener- 

required for the storage of counters is smaller compared to ated "data" is actually executable sequences of instructions 

traditional profiling schemes based on branch or basic block (traces) that are being stored in the translated code cache 
counting, because, with the present invention, it is not 50 130. 

necessary to keeping track of counts for each block or for The functioning of the interpreter 110, 210, 245 in the 

each branch. dynamic translator of the illustrative embodiment has been 

BRIEF DESCRIPTION OF THE DRAWING extended so that it has a new operating state (referred to 

^ below as "grow trace mode"): when in that new state, the 

-Die invention is pointed out with particularity m the 55 ^^^^^ ^^e for a trace is emitted as a side effect of 

appended clauns. The above and other advantages of the interpretation. For a native-to-native translation, this process 

invention may be better understood by referring to the emitting instruaions simply amounts to passing on the 

following detailed description in conjunction with the .^levant instructions from the input instruction stream 160. 

drawing, m which: p^j. ^^^^^ translations, the input instructions are translated 

HG.l illustrates the components of a dynamic translator into native instructions, and those native instmctions are 

such as one in which the present invention can be employed; recorded in a buffer. The translated native instructions are 

FIG. 2 illustrates the flow of operations in an implemen- then executed and then emitted— the buffer of translated 

tation of a dynamic translator employing the present inven- instructions is made available for further processing (i.e., 

tion; and optimization 255 and placement into the cache 260). 

FIG. 3 shows program flow through four blocks of a 65 Although a block is a useful unit to translate, execute, and 

program, iUustrating that there can be a number: of different emit, the interpreter may emit translated instructions in other 

traces starting with a common block. units, and the interpreter may perform the translate — 
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execute loop on one size (such as instruction or block) and cally when the native code for the trace is emitted into the 

pass translated instructions on for further processing in translated code cache. In the illustrative embodiment, as a 

different units (such as a block or trace). Also, various matter of convenience, the exit counters are stored with the 

alternative implementations of an interpreter that emits trampoline instructions; however, the counter could be 
translated instructions are possible. S stored elsewhere, such as in an array of counters. 

The native code emitted by the interpreter 245 is stored in Referring again to 215 in FIG, 2, if, when the cache is 
the translated code cache 130 for execution without the need checked for a trace starting at the target of the taken branch, 
for interpretation the next time that portion of the program no such trace exists in the cache, then a determination is 
is executed (unless intervening factors have resulted in that made as to whether a "start-of-trace" condition exists 230. In 
code having been flushed from the cache). In FIG. 2, the lO the illustrative embodiment, the slart-of-trace condition is 
"normal mode" operation of the interpreter 110 is shown at when the just interpreted branch was a backward taken 
210 and the "grow trace mode" operation of the interpreter branch. Alternatively, a system could employ different start- 
is shown at 245. of-trace conditions that combined with or did not include 

The grow trace mode 245 of the interpreter 110 is backward taken branches: procedure call instructions, exits 

exploited in the present invention as a mechanism for code cache, system call instmctions, or machine 

identifying the extent of a trace; not only does grow trace instruction cache misses (if the hardware provided some 

mode generate data (instructions) to be stored in the cache, n^^^ns for tracking such things). 

it plays a role in trace selection process itself. As described A backward taken branch is a useful start-of-trace condi- 

abovc, the present invention initiates trace selection based tion because it exploits the observation that the target of a 
on limited profiling: certain addresses that meet start-of- ^ backward taken branch is very likely to be (though not 

trace conditions are monitored, without the need to maintain necessarily) the start of a loop. Since most programs spend 

profile data for entire traces. A trace is selected based on a a significant amount of time in loops, loop headers are good 

hot start-of-trace condition. This selection is speculative, candidates as possible hot spot entrances. Also, since there 

because the actual trace being selected (which will be are usually far fewer loop headers in a program than taken 
determined as the interpreter works its way through the trace ^ branch targets, the number of counters and the time taken in 

in grow trace mode) may not be a frequently executed, even updating the counters is reduced significantly when one 

though it starts at a frequently executed starting address. At focuses on the targets of backward taken branches (which 

the time a trace is identified as being hot (based on the are likely to be: loop headers), rather than on all branch 

execution counter exceeding a threshold), the extent of the targets. 

instructions that make up the trace is not known. The process If the start-of-trace condition is not met, then control 
of the interpreter emitting instructions is what maps the re-enters the basic interpreter state and interpretation con- 
extent of the trace; the trace grow mode is used to uruavel tinues. In this case, there is no need to maintain a coimter; 
the trace on the fly. a counter increment takes place only if a start-of-trace 
For example, referring to FIG. 3, four blocks of a program condition is met. This is in contrast to conventional dynamic 
are shown to illustrate how identification of a trace starting translator implementations that have maintained counters 
point does not itself fully identify a trace. Block A meets the for each branch target. In the illustrative embodiment 
start-of-trace condition (it is the target of a backward branch counters are only associated with the address of the back- 
from D). With four blocks having the branching relationship ward taken branch targets and with targets of branches that 
shown in FIG. 3, the foUowing traces all share the same exit the translated code cache; thus, the present invention 
starting point (A): ABCD, ABD, ACD. The trace that the permits a system to use less counter storage and to incur less 
program follows at the time that the counter for A becomes counter increment ovc±cad. 

hot is the trace that is selected for storage in the cache in If the determination of whether a "start-of-trace" condi- 

response to that counter becoming hot — it could be any of tion exists 230 is that the start-of-trace condition is met, 

those three traces (actually, there may be more than three then, if a counter for the target does not exist, one is created 

possible traces, if the traces continue beyond D). or if a counter for the target does exist, that counter is 

Referring to FIG. 2, the dynamic translator starts by incremented, 

interpreting instructions until a taken branch is interpreted If the counter value for the branch target does not exceed 

210. At that point, a check is made to see if a trace that starts the hot threshold 240, then control re-enters the basic 
at the taken branch target exists in the code cache 215. If 50 interpreter state and interpretation continues 210. 

there is such a trace (i.e., a cache *hit'), control is transferred If the counter value does exceed a hot threshold 240, then 

220 to the top of that version of the trace that is stored in the this branch target is the beginning of what wiU be deemed 

cache 130. to be a hot trace. At this point, that counter value is not 

When, after executing instructions stored in the cache longer needed, and that counter can be recycled 
130, control exits the cache via an exit branch, a counter 55 (alternatively, the counter storage could be reclaimed for use 

associated with the exit branch target is incremented 235 as for other purposes). This is an advantage over profiling 

part of the "trampoline" instruction sequence that is schemes that involve instrumenting the binary, 

executed in order to hand control back to the dynamic Because the profile data that is being collected by the 

translator. When the trace is formed for storage in the cache start-of-trace counters is consumed on the fly (as the pro- 
130, a set of trampoline instructions is included in the trace 60 gram is being executed), these counters can be recycled 

for each exit branch in the trace. These instructions (also when its information is no longer needed; in particular, once 

known as translation "epilogue") transfer control from the a start-of-trace counter has become hot and has been used to 

instructions in the cache back to the interpreter — trace select a trace for storage in the cache, that counter can be 

selector loop. An exit branch counter is associated with the recycled. The illustrative embodiment includes a fixed size 
trampoline corresponding to each exit branch. Like the 65 table of start-of-trace counters. The table is associative — 

storage for the trampoline instructions for a cached trace, the each counter can be accessed by means of the start-of-trace 

storage for the trace exit counters is also allocated automati- address for which the counter is counting. When a counter 
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for a particular start-of-trace is to be recycled, that entry in the PA-RISC processor, on which the illustrative embodi- 

the table is added to a free list, or otherwise marked as free. ment is implemented). 

The lower the threshold, the less time is spent in the Although the cache can be sized large enough so that 
interpreter, and the greater the number of start-of-traces that replacement of entries is not required, typically a replace- 
potentially get hot. This results in a greater number of traces ^ ment scheme will be used. One approach is to flush the cache 
being generated into the code cache (and the more specu- when it is full and space for a new entry is needed. However, 
lative the choice of hot traces), which in turn can increase the another approach that offers advantages is to flush the cache 
pressure on the code cache resources, and hence the over- preemptively, based on some indication that the program's 
head of managing the code cache. On the other hand, the working set is changing. Such a preemptive approach is 
higher the threshold, the greater the interpretive overhead 10 described in the co-owned application titled "A Preemptive 
(e.g., allocating and incrementing counters associated with Replacement Strategy For A Caching Dynamic Translator" 
start-of-traces). Thus the choice of threshold has to balance Sanjeev Banerjia, Vasanth Bala, and Evelyn Duesterwald, 
these two forces. It also depends on the actual interpretive filed the same date as the present apphcation. 
and code cache management overheads in the particular When a trace is removed from the code cache, the 
implementation. In our specific implementation, where the 15 memory used for coimter storage for each of the trace's exit 
interpreter was written as a software fetch-decode -eval loop branches is automatically recovered. Thus, the storage for 
in C, a threshold of 50 was chosen as the best compromise. these exit branch target counters is "free" in that sense, 

If the counter value does exceed a hot threshold 240, then, because they do not have to be independenfly allocated and 

as indicated above, the address corresponding to that counter managed like the other counters associated with interpreted 

will be deemed to be the start of a hot trace. At the time the ^ branch targets (those targets that have met start-of-trace 

trace is identified as hot, the extent of the trace remains to conditions, but for which the associated counter has not yet 

be determined (by the end-of-trace condition described exceeded the "hot" threshold); as discussed above, the exit 

below). Also, note that the selection of the trace as 'hot' is branch target counters are allocated as a part of creating the 

speculative, in that only the initial block of the trace has trampoline for the exit branch. 

actually been measured to be hot. In the illustrative embodiment, FIGS. 1 and 2 are related 

At this point, the interpreter transitions from normal mode as follows; one skifled in the art will appreciate that these 

210 to grow trace mode 245. In this mode, as described functions can be organized in other ways in other imple- 

above, interpretation continues, except that as instructions mentations. The interpreter 210 implements 210 and 245. 

are interpreted, the native translation of the instructions is The code generator 140 implements 260. The trace opti- 

also emitted so that they can be stored in the code cache 130. mizer 150 implements 255. The trace selector 120 imple- 

The interpreter stores the native instructions into a buffer. ments 215, 220, 230, 235, 240, and 250. 

When an end-of-trace condition is reached 250, the buffer The illustrative embodiment of the present invention is 

with the complete trace is handed to an optimizer 255. After implemented as software running on a general purpose 

optimization, the optimized native instructions are placed computer, and the present invention is particularly suited to 

into the cache, and the counter storage associated with the software implementation. Special purpose hardware can also 

trace's starting address is recycled 260. (Alternatively, the be useful in connection with the invention (for example, a 

counter storage could be recycled as early as when the hardware * interpreter', hardware that facilitates collection of 

counter has been determined to exceed the hot threshold.) profiUng data, or cache hardware). 

Also, triggered by the end-of-trace condition, the interpreter The foregoing has described a specific embodiment of the 

110 transitions back to the normal interpreter state. invention. Additional variations will be apparent to those 

An end-of-trace condition is simply a hetu-istic that says skilled in the art. For example, although the invention has 

when to stop growing the current trace. The following are been described in the context of a dynamic translator, it can 

some examples of some possible end-of-trace conditions: also be used in other systems that employ interpreters or 

ending a trace when a backward taken branch is reached 45 just-in-time compilers (JTT^). Further, the invention could be 

avoids unfolding cycles unnecessarily and also captures employed in other systems that emulate any non-native 

loops; a "return" branch can be a useful end-of-trace because system, such as a simulator. Thus, the invention is not 

it can indicate the end of a procedure body; generally, it is limited to the specific details and illustrative example shown 

desireable to trigger and end-of-trace if a new start-of-trace and described in this specification. Rather, it is the object of 

has occurred. 5q the appended claims to cover all such variations and modi- 

In the illustrative embodiment, the end-of-trace condition Acations as come within the tme spirit and scope of the 

is met when (a) a backward taken branch is interpreted, or invention, 

(b) when a certain number of branch instructions has been ^® claim: 

interpreted (in the illustrative embodiment this number is 1- I" ^ dynamic translator, a method for selecting hot 

20) since entering the grow trace mode (capping the number 55 ^^^^ ^ * program being translated comprising the steps of: 

of branches on the trace limits the number of places that (A) dynamicaUy associating counters with addresses in 

control can exit the trace — the greater the number of the program being translated that arc determined, dur- 

branches that can exit the trace, the less the likelihood that ing program translation and execution, to meet a start- 

the entire trace is going to be utilized and the greater the of-trace condition; 

likelihood of an early trace exit), or (c) a certain number of 60 (B) when an instruction with a corresponding counter is 

native translated instructions has been emitted into the code executed, incrementing that counter, and 

cache for the current trace. The limit on the number of (C) when a counter exceeds a threshold, determining the 

instructions in a trace is chose to avoid excessively long particular trace (of the possible plurality of traces 

traces. In the illustrative embodiment, this is 1024 beginning at that address) that begins at the address 

instructions, which allows a conditional branch on the trace 65 corresponding to that coimter and is defined by the path 

to reach its extremities (this follows from the number of of execution taken by the program following that 

displacement bits in the conditional branch instruction on particular execution of that instmction and continuing 
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until an end-of-trace condition is met and identifying 
that trace as a hot trace. 

2. The method of claim 1 in which the dynamic translator 
includes an interpreter that can be switched between a 
normal mode and a grow trace mode in which the interpreter S 
emits native instructions as a side -effect of interpretation, the 
method further comprising the steps of: 

(D) when a counter exceeds a threshold, switching the 
interpreter to grow trace mode; 

(E) when the interpreter is in grow trace mode and an 
end-of-trace condition is met, switching the interpreter 
to normal mode; 

(F) using the instructions emitted by the interpreter to 
determine the trace that is to be identified as a hot trace. 

3. The method of claim 1 in which, in response to 
identifying a trace as a hot trace, the coaesponding counter 
is recycled. 

4. The method of claim 1 in which the start-of-trace 
condition is when the last interpreted branch was backward ^ 
taken. 

5. The method of claim 2 in which the start-of-trace 
condition is when the last interpreted branch was backward 
taken. 

6. In a dynamic translator comprising: ^ 

(A) counters for storing counts of the number of times that 
instructions are executed at addresses associated with 
the counters; 

(B) means for identifying addresses that meet a start-of- 
trace condition and for associating such addresses with 30 
counters; 

(C) means for determining when a counter exceeds a 
threshold; and 

(D) trace identification means for identifying a series of 
instructions executed following the instruction at the 
address associated with the cotmter as a hot trace. 

7. The dynamic translator of claim 6 further comprising 
an interpreter that can be switched between a normal mode 
and a grow trace mode in which the interpreter emits native 
instructions as a side-effect of interpretation, and in which 
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the interpreter is switched to the grow trace mode in 
response to a coimter exceeding a threshold, and in which 
the trace identification means uses the emitted instructions in 
its identification of a hot trace. 

8. The translator of claim 6 in which, in response to 
identifying a trace as a hot trace, the corresponding counter 
is recycled. 

9. The translator of claim 6 in which the start-of-trace 
condition is when the last interpreted branch was backward 
taken. 

10. The translator of claim 7 in which the start-of-trace 
condition is when the last interpreted branch was backward 
taken. 

11. In a dynamic translator, a method for selecting hot 
traces comprising the steps of: 

(A) maintaining counts for addresses that meet a start-of- 
trace condition; 

(B) deteaing when one of these counters exceeds a 
threshold; 

(C) in response to a counter exceeding a threshold, 
identifying as a hot trace the instructions beginning 
with the address with which that cotmter was associated 
and continuing tmtil reaching an end-of-trace condi- 
tion. 

12. In a dynamic translator having a cache, a method for 
selecting hot traces comprising the steps of: 

(A) when a backward branch is taken, if a counter does 
not exist for the branch target, then creating such a 
counter, ff such a counter does exist, then incrementing 
the counter; 

(B) if a counter exceeds a threshold, then storing in the 
cache a translation of those instructions executed start- 
ing at the branch target associated with the counter that 
exceeded the threshold and continuing until an end-of- 
trace condition is reached. 

13. The translator of claim 12 in which, when a translation 
is stored in the cache, the corresponding counter is recycled. 

^ i¥ * * * 
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